Clustering with evolution strategies
نویسندگان
چکیده
-Tbe applicability of evolution strategies (ESs), population based stochastic optimization techniques, to optimize clustering objective functions is explored. Clustering objective functions are categorized into centroid and non-centroid type of functions. Optimization of the centroid type of objective functions is accomplished by formulating them as functions of real-valued parameters using ESs. Both hard and fuzzy clustering objective functions are considered in this study. Applicability of ESs to discrete optimization problems is extended to optimize the non-centroid type of objective functions. As ESs are amenable to parallelization, a parallel model (master/slave model) is described in the context of the clustering problem. Results obtained for selected data sets substantiate the utility of ESs in clustering. Hard clustering Fuzzy clustering Optimal partition Evolution strategies l. I N T R O D U C T I O N Clustering methods play a vital role in data analysis. They have been effectively applied in different areas including image processing, and exploratory data analysis. It) Various types of clustering algorithms have been proposed to suit different requirements. Broadly, clustering algorithms can be classified into hierarchical and partitional algorithms. Hierarchical methods build a dendrogram structure of the given data, whereas partitional methods divide the data into a specified or estimated (ISODATA) ¢1) number of non-overlapping clusters. Typically clustering algorithms aim at optimizing a chosen mathematical objective function. In this paper, we confine our discussion to partitional methods. In general, partitional methods are iterative hill climbing techniques that optimize the stated objective function and usually converge to a locally optimal partition. Clustering objective functions are highly non-linear and multi-modal functions. As a consequence, it is very difficult to find optimal partition of the given data using hill climbing techniques. Readers are referred to references (1-3) for a detailed coverage on cluster analysis and its applications. The number of possible partitions of a given data set with T data items and C clusters is given by the stifling approximation ~2) S = C ! * ( 1)C-k* *k r (1) k = l It can be observed that for T = 100 and C = 2, we need to search 2 9 9 1 possible partitions in order to arrive at an optimal partition. So exhaustive enumeration is ruled out due to its exponential time complexity. Many attempts have been made to reduce the computational effort in finding the optimal partition. Some techniques such as integer programming, t4) dynamic programming tS) and branch and bound {6) methods have been applied to reduce the computational burden of exhaustive enumeration. Even though these methods are better than exhaustive enumeration, these are still computationally expensive for moderate and large values of T and C (> 2). Stochastic optimization approaches such as simulated annealing and genetic algorithms have been used in the context of the optimal clustering problem. 16'v) Theoretically, it is possible to obtain optimal solutions using these methods, but in practice only near-optimal solutions will be obtained except in some special circumstances. References (6, 7) solve the afore-mentioned problem by formulating it as a discrete optimization problem, i.e. assigning each of the data items to one of the clusters. These approaches take large amounts of time to converge to globally optimal partition. An alternative way of solving the optimal partition problem is to find optimal initial seed values for which the selected algorithm converges to a globally optimal solution. Both the above stochastic methods are applied to select optimal initial seed values {s'9) and were reported to produce superior results. A centroid type of objective function, within group sum of squared (WGSS) error measure, was considered. However, the same approach can be extended to optimize the non-centroid type of objective functions also. Here, a centroid type of objective function refers to one which depends on the centroids of the clusters formed, whereas a noncentroid type of objective function does not involve cluster centers in its formulation. Hard clustering deals with assigning each data item to exactly one of the clusters. Whereas fuzzy clustering extends" this concept to associate each data item to each of the clusters with a measure of belongingness. We restrict our discussion to fuzzy C-means (FCM) clustering objective functions. Fuzzy C-means objective function, an extended form of the hard WGSS criterion, was first proposed by Dunn. I1°) Later a generalized form of the FCM algorithm with a family of objective functions, J,,('), 1 _< m < ~ , was proposed by Bezdek: H)
منابع مشابه
Improved Automatic Clustering Using a Multi-Objective Evolutionary Algorithm With New Validity measure and application to Credit Scoring
In data mining, clustering is one of the important issues for separation and classification with groups like unsupervised data. In this paper, an attempt has been made to improve and optimize the application of clustering heuristic methods such as Genetic, PSO algorithm, Artificial bee colony algorithm, Harmony Search algorithm and Differential Evolution on the unlabeled data of an Iranian bank...
متن کاملIncreasing the Accuracy of Recommender Systems Using the Combination of K-Means and Differential Evolution Algorithms
Recommender systems are the systems that try to make recommendations to each user based on performance, personal tastes, user behaviors, and the context that match their personal preferences and help them in the decision-making process. One of the most important subjects regarding these systems is to increase the system accuracy which means how much the recommendations are close to the user int...
متن کاملCombining the Strengths of the Bayesian Optimization Algorithm and Adaptive Evolution Strategies
This paper proposes a method that combines competent genetic algorithms working in discrete domains with adaptive evolution strategies working in continuous domains. We use discretization to transform solution between the two domains. The results of our experiments with the Bayesian optimization algorithm as a discrete optimizer and -selfadaptive mutation of evolution strategies as a continuous...
متن کاملDifferential Evolution with Competing Strategies Applied to Partitional Clustering
We consider the problem of optimal partitional clustering of real data sets by optimizing three basic criteria (trace of within scatter matrix, variance ratio criterion, and Marriottt’s criterion). Four variants of the algorithm based on differential evolution with competing strategies are compared on eight real-world data sets. The experimental results showed that hybrid variants with k-means ...
متن کاملEvaluating the Growth and Evolution of Facility Management in Innovating Integrating and Aligning Business Strategies to Achieve a Competitive Advantage
The South African Facilities Management (FM) industry has seen increased operational strategy complexity from single-site contractors providing basic janitorial services to highly integrated and bundled FM service providers. Despite these major changes, very little research has been conducted on evaluating the effectiveness of FM in innovating, integrating and aligning business strategies to a...
متن کاملEffective Strategies for Optimal Implementation of Evolution and Innovation Packages in Medical Education
ABSTRACT BACKGROUND AND OBJECTIVE: Evolution and innovation packages in medical science education are the main program of medical education and it is necessary to pay attention to the provision of infrastructure of their implementation. This study was conducted to identify effective strategies for optimal implementation of evolution and innovation packages in medical education. METHODS: The met...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pattern Recognition
دوره 27 شماره
صفحات -
تاریخ انتشار 1994